Skip to content

Conversation

Tiger1218
Copy link

The pcidevice collector panics with a "nil pointer dereference" error
on some systems. This occurs when a PCI device lacks certain attributes
in sysfs, such as 'current_link_speed' or 'current_link_width'.

The collector's Update method attempted to dereference these pointers
without verifying if they were nil, leading to a crash.

This commit introduces checks to verify that the link speed and width
pointers are not nil before they are dereferenced. If an attribute is
missing for a device, its corresponding metric will now be reported as 0,
making the collector more robust and preventing node_exporter from
crashing.

Signed-off-by: Tiger1218 <tiger1218@foxmail.com>
@yuuki
Copy link

yuuki commented Oct 19, 2025

I have also encountered this panic error.

In my case, I've got the following stack trace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xa7e6cd]
goroutine 56 [running]:
github.com/prometheus/node_exporter/collector.(*pcideviceCollector).Update(0xc0002ec640, 0xc0003ea230)
        /home/tsubouchi/node_exporter/collector/pcidevice_linux.go:129 +0x88d
github.com/prometheus/node_exporter/collector.execute({0xcbdf9e, 0x9}, {0xdf17a0, 0xc0002ec640}, 0xc0003ea230, 0xc0001a8470)
        /home/tsubouchi/node_exporter/collector/collector.go:160 +0x85
github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func1({0xcbdf9e?, 0x0?}, {0xdf17a0?, 0xc0002ec640?})
        /home/tsubouchi/node_exporter/collector/collector.go:151 +0x33
created by github.com/prometheus/node_exporter/collector.NodeCollector.Collect in goroutine 43
        /home/tsubouchi/node_exporter/collector/collector.go:150 +0xce

The issue occurs in my environment where current_link_speed is set to Unknown:

$ cat /sys/bus/pci/devices/0000:00:00.0/current_link_speed
Unknown

This PR's fix addresses exactly what I'm experiencing.

var maxLinkWidth, currentLinkWidth float64

if device.MaxLinkSpeed != nil {
maxLinkSpeedTS = float64(int64(*device.MaxLinkSpeed * 1e9))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will emit a 0 for metrics we don't know the value for.

I think we should do something like this:

Suggested change
maxLinkSpeedTS = float64(int64(*device.MaxLinkSpeed * 1e9))
ch <- c.descs[0].mustNewConstMetric(maxLinkSpeedTS, device.Location.Strings()...)

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a rebase.

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants